Search CORE

5 research outputs found

Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

Author: Shi Yanchen
Tang Youze
Xiao Xiaokui
Publication venue
Publication date: 29/04/2014
Field of study

Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / \epsilon^2) expected time and returns a (1-1/e-\epsilon)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the \Omega(m + n) lower-bound established in previous work (for fixed k, \ell, and \epsilon). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, \epsilon = 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.Comment: Revised Sections 1, 2.3, and 5 to remove incorrect claims about reference [3]. Updated experiments accordingly. A shorter version of the paper will appear in SIGMOD 201

arXiv.org e-Print Archive

CiteSeerX

An efficient algorithm for mapping vehicle trajectories onto road networks

Author: Tang Youze
Xiao Xiaokui
Zhu Andy Diwen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Modern mobile technology has enabled the collection of large scale vehicle trajectories using GPS devices. As GPS measurements may come with error, vehicle trajectories are often noisy. A common practice to alleviate this issue is to apply map-matching, i.e., to align vehicle trajectories with the road segments in a digitized road network. This paper presents an efficient solution for map-matching problem that won the SIGSPATIAL CUP 2012. Given a road network, our solution first constructs a gird index on the road segments. For each point p on a vehicle trajectory, we employ the index to identify a candidate set of road segments that are close to p, and then we refine the candidate set to select a segment that matches p with the highest probability. The selection of the best match is based on a metric that takes into account (i) the correlation between consecutive GPS measurements as well as (ii) the directions and shapes of the road segments. Experimental results on real vehicle trajectories and road networks demonstrate the effectiveness and efficiency of the proposed solution

Crossref

DR-NTU (Digital Repository of NTU)

HubPPR: Effective Indexing for Approximate Personalized PageRank

Author: Li Zengxiang
Tang Youze
Wang Sibo
Xiao Xiaokui
Yang Yin
Publication venue
Publication date: 01/01/2016
Field of study

Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query Π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.MOE (Min. of Education, S’pore)Published versio

DR-NTU (Digital Repository of NTU)

Distribution of Triarrhena lutarioriparia and its reserve characteristics of nitrogen and phosphorus in Dongting Lake

Author: Guangyi Fu
Lincheng Jian
Nan Tang
Youze Xu
Yuanyuan Zhao
Zhonghao He
Publication venue: 'EDP Sciences'
Publication date: 09/02/2021
Field of study

Triarrhena lutarioriparia, a typical and most abundant macrophyte in Dongting lake wetland, was in the state of abandonment following the papermaking industry revocation in the lake basin. In order to provide scientific basis for precise management of T. lutarioriparia, the T. lutarioriparia distribution charateristics in Dongting Lake and its storage characteristics of nutrients were investigated in this study. Remote sensing interpretation results showed that the total area of T. lutarioriparia in Dongting Lake wetland was 58, 450 ha, 48.31% of which distributed in South Doting Lake wetlands. The nutrients contents were significantly different in T. lutarioriparia tissues, ranking in the descending order of spikes (TN 27.90 mg/g, TP 3.46 mg/g)>leaves (TN 16.38 mg/g, TP 2.11 mg/g)>stems (TN 5.38 mg/g, TP 0.85 mg/g). The total P quantities in each T. lutarioriparia tissue were ranked in the order: stems (560.26 t)>leaves (396.52 t)>spikes (284.67 t), while the total N quantities were within the range of 2170.02-2801.3 t. It was estimated that about 7712.99 t of TN and 1241.45 t of TP were annually removed from Dongting Lake by reaping T. lutarioriparia. The nutrients stored in the dead tissues of T. lutarioriparia might possess non-negligible impact on the water quality of Doting Lake

EDP Sciences OAI-PMH repository (1.2.0)